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ABSTRACT 

It is widely believed that maximum likelihood estimators must be used to provide 
optimal estimates of power spectra. Since such estimators require the inversion and 
multiplication of x Nj^ matrices, where is the size of the data vector, maximum 
likelihood estimators require at least of order operations and become computa- 
tionally prohibitive for greater than a few tens of thousands. Because of this, a 
large and inhomogeneous literature exists on approximate methods of power spectrum 
estimation. These range from manifestly sub-optimal, but computationally fast meth- 
ods, to near optimal but computationally expensive methods. Furthermore, much of 
this literature concentrates on the power spectrum estimates rather than the equally 
important problem of deriving an accurate covariance matrix. In this paper, I con- 
sider the problem of estimating the power spectrum of cosmic microwave background 
(CMB) anisotropics from large data sets. Various analytic results on power spectrum 
estimators are derived, or collated from the literature, and tested against numerical 
simulations. An unbiased hybrid estimator is proposed that combines a maximum 
likelihood estimator at low multipoles and pseudo-C^ estimates at high multipoles. 
The hybrid estimator is computationally fast {i.e. it can be run on a laptop com- 
puter for Planck sized data sets), nearly optimal over the full range of multipoles, and 
returns an accurate and nearly diagonal covariance matrix for realistic experimental 
configurations (provided certain conditions on the noise properties of the experiment 
are satisfied). It is argued that, in practice, computationally expensive methods that 
approximate the 0{N^) maximum likelihood solution are unlikely to improve on the 
hybrid estimator, and may actually perform worse. The results presented here can be 
generalised to CMB polarization and to power spectrum estimation using other types 
of data, such as galaxy clustering and weak gravitational lensing. 

Key vifords: Methods: data analysis, statistical; Cosmology: cosmic microwave back- 
ground, large-scale structure of Universe 



1 INTRODUCTION 

The estimation of power spectra from various types of data is becoming increasingly important in cosmology. There are three 
main reasons for this. Firstly, there has been an explosion in the size and quality of cosmological data sets, for example, 
the detailed maps of the CMB sky from WMAP* (Bennett et al. 2003), the 2dFGRS (Colless et al. 2001) and SDSS galaxy 
redshift survey (Zehavi et al. 2002; Tegmark et al. 2004). With such large data sets it is now possible to measure power spectra 
accurately over a wide range of angular and spatial scales. Secondly, the power spectrum is a simple two-point statistic and 
so it is natural that more effort has been devoted to optimal methods of power spectrum estimation rather than to optimal 
estimators for higher order statistics. Thirdly, in most realisations of the inflationary scenario, the fluctuations generated 
during the inflationary phase are predicted to be Gaussian (see e.g. Liddle and Lyth, 2000, for a review). If the fluctuations 
are Gaussian, and as long as they remain linear, all of the information pertaining to the fluctuations is contained in the power 

* WMAP: Wilkinson Microwave Anisotropy Probe; 2dFGRS: Two Degree Field Galaxy Redshit Survey; SDSS: Sloan Digital Sky Survey. 
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spectrum. In this case, estimation of the power spectrum can be viewed as a form of lossless data compression, reducing, for 
example, lO'^ measurements of the CMB temperature differences AT, into 2000 or so values of the angular power spectrum 
(sec e.g. Tegmark 1997). This form of data compression is invaluable, since physical parameters of interest (matter densities, 
spectral indices etc) can be estimated from the power spectrum and its covariance matrix, rather than from the pixel values 
themselves. 

One of the earliest applications of power spectrum analysis to cosmology was Yu and Peebles' (1969) analysis of the 
angular distribution of rich clusters of galaxies. These authors apphed what is now (unfortunately) commonly referred to 
as a 'pseudo-C^' estimator (see Section 2). In the rest of this paper, we will discuss the specific problem of power spectrum 
estimation of CMB temperature anisotropics, although many of the results and remarks arc applicable to other types of data, 
such as galaxy clustering. A comprehensive analysis of pseudo-C^ estimators was given by Peebles (1973) and an application 
to the angular clustering of galaxies can be found in Peebles and Hauser (1974). Although it was well known that pseudo-C^ 
estimators are sub-optimal, little work was done to develop more optimal power spectrum estimators until recently (the paper 
by Feldman, Kaiser and Peacock, 1994, on power spectrum analysis of redshift surveys is a noteable exception). Hamilton 
(1997a, b), Tegmark (1997) and Bond, Jaffe and Knox (1998) constructed estimators that are optimal, in the sense that they 
find the power spectrum that maximises the likelihood function given the data. However, there is a fundamental problem with 
these maximum likelihood estimators. Given a data vector of length Nd, optimal estimators require the inversion of Nd x Nd 
matrices. Since a matrix inversion requires 0{Nd) operations, brute force application of maximum likelihood estimators 
becomes impractical for Nd greater than a few tens of thousands^^ (see e.g. Borrill 1999a, b). As a result, a large number of 
papers have appeared in the last few years that discuss various solutions to the problem of power spectrum estimation from 
large data sets. 

These papers can be grouped, roughly, into the following categories: 

(i) Pseudo-Ce estimators: These are straightforward variants of the methods described by Peebles (1973). A discussion of a 
pseudo-C^ estimator (hereafter PCL) applied to CMB experiments is given by Wandelt, Hivon and Gorski (2001) and Hivon 
et al. (2002). Variants designed to recover the true shape of the power spectrum from apodised regions of the sky are discussed 
by Hansen, Gorski and Hivon (2002) and Hansen and Gorski (2003). Those estimators can be evaluated using fast spherical 
transforms"!" {e.g. Muciaccia, Natoli and Vittorio 1997) for which the number of operations scale as N'd'^ ■ Even for data vectors 
of the length expected from the Planck mission (Bersannelli et al. 1996), Nd ~ lO'^, it is possible to evaluate a PCL estimator 
using a laptop computer (see e.g. Balbi et al. 2002). A PCL estimator was used in the analysis of the WMAP data (Hinshaw 
et al. 2003) 

A related class of sub-optimal estimators use fast evaluation of the two-point correlation function, which can then be 

transformed to give an estimate of the power spectrum. Methods of this type, designed for the analysis of CMB anisotropics 
are described by Szapudi et al. (2001a) and by Szapudi, Prunet and Colombi (2001b). A generalisation of these methods to 
the analysis of CMB polarization is discussed by Chon et al. (2003). We group this type of estimator along with the PCL 
methods described in the previous paragraph because they are almost mathematically equivalent. 

(a) Maximum likelihood estimators: Given a data vector Xi of length Nd, and assuming that the Xi are Gaussian distributed, 
we can define a likelihood function 



where S is the signal covariance matrix and A'^ is the noise matrix. A maximum likelihood estimator finds the Ci, or an 
approximation to the C(, that maximises the likelihood (1). 

Two types of maximum likelihood (herafter ML) estimator have been discussed widely. In the first, which we will refer to 
as NRML, the power spectrum that maximises the likelihood function is found iteratively using a Newton-Raphson algorithm 
(Bond, Jaffe and Knox 1998; Ruhl et al. 2003). In the second, which we will refer to as QML, a quadratic estimator is defined 
based on an assumed form for Ci which is equivalent to a maximum likelihood solution if the guess for Ci is close to the true 
power spectrum {e.g. Tegmark 1997). From equation (1) one can see that both the NRML and QML estimators require the 
inverse and determinant of the covariance matrix C and so require of 0{Nd) operations. 

Various methods have been proposed to speed up the computation of ML estimators. Oh, Spergel and Hinshaw (1999) use 
a conjugate gradient algorithm with a carefully chosen preconditioner to avoid direct invertion of the matrix (2) . Dore, Knox 
and Peel (2001) propose an NRML estimator based on hierarchical decomposition of a large map into sub-maps of various 

t Perhaps 100, 000 if one has access to a supercomputer, 
■t Or Fast Fourier Transforms in three dimensions. 



^^^'1''^ " (27r)^<i/2|C|V2 




(1) 



where Ci is the power spectrum to be estimated. The covariance matrix C in (1) is 



dj = {xiXj) = Sij{Ce) + Nij, 



(2) 
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resolutions, though this method has the disadvantage of a loss of resolution in the power spectrum estimates. An iterative 
multi-grid method is described by Pen (2003) and a method that uses Monte Carlo Markov Chains is described by Jewell, 
Levin and Anderson (2002). 

(iii) Harmonic and Ring-torus Estimators: For a Planck-type scanning strategy, a ML solution for the spherical harmonic 
coefficients aim can be determined from the Fourier transform of the temperature differences on a set of rings on the sky 
(van Leeuwen et al. 2002; Challinor et al. 2002). The data vector Xi in equation (1) is then the set of harmonic coefficients 
aim- This type of 'harmonic' method avoids the need to construct pixelised maps of the temperature differences on the 
sky and can account for asymmetric beams and certain types of correlated noise. However, a brute force application of the 
harmonic method would require of ©(^max) operations, where ^max is the highest multipole under consideration, and so is 
not computationally feasible for high multipoles. The operation count can be reduced for simple scanning strategies and by 
using conjugate gradient techniques, but so far little work has been done along these lines. A related method is described by 
Wandelt and Hansen (2003), who show that for special scanning geometries the time-ordered data (TOD) can be wrapped 
onto a ring torus. In this case, the ML solution for Cc can be found in 0{N^) operations using the fast spherical convolution 
algorithm described by Wandelt and Gorski (2001). Even if the scanning geometry of an experiment does not satisfy the 
precise conditions for an O(AfJ) solution, Wandelt and Hansen argue that the ring-torus method can be useful in determining 
a preconditioner to speed up the computation of the exact ML solution. 

Given the large number of methods described in the previous paragraphs, what is the best way to estimate a power 
spectrum from a large data set, including realistic sources of errors? In the view of this author, the work summarised above 
suggests two approaches: § 

(A) Application of a computationally expensive ML method that can account for various sources of error, such as correlated 
receiver noise, beam asymmetries etc. For a large {Nd XL 10^) data set, the noise matrix N in equation (2) is neither calculable 
nor storeable, and so if correlated noise is to be taken into account the method would have to be based on (and exploit 
symmetries in) the TOD. 

(B) Application of a fast PCL-based method which can be used to derive an analytic covariance matrix under certain 
simplifying assumptions {e.g. diagonal noise matrix, symmetric beams). Real-world complications can then be treated as 
perturbations to the covariance matrix, which can be estimated by running a large number of Monte Carlo simulations. 

The point of view adopted in this paper is that approach (B) is preferable to approach (A) and that there has been 
an over-emphasis in the literature on computationally expensive maximum likelihood methods. There are several reasons for 
taking this viewpoint: 

(a) Does a maximum likelihood estimator lead to a significant reduction in the error bars compared to using a PCL estimator? 
In some limits it is possible to prove that a PCL is statistically equivalent to a ML estimator (Sections 3 and 5). Furthermore, 
for realistic experiments there may be 'irreducible' errors on the power spectrum estimates that dominate over any minor 
differences between estimators. For example, in the WMAP experiment beam calibration uncertainties are the dominant 
source of error in the range 100 ^ i ^ 600 and residual Galactic emission is the dominant source of error at multipoles ^ ^ 10 
(Hinshaw et al. 2003). 

(h) As discussed above, for many purposes, a power spectrum is simply a convenient form of data compression to enable 
fast estimation of a small number of physical parameters. But to determine the physical parameters we must be able to 
write down a likelihood function, which requires at least an accurate covariance matrix for the power spectrum estimates and 
preferably an accurate model for the shape of the likelihood function (sec e.g. Bond, Jaffe and Knox, 2000). There is little 
point in applying a computationally expensive method of power spectrum estimation, which may only produce a marginal 
improvement in the estimates themselves, if the method returns a poor estimate of the covariance matrix. The estimation of 
an accurate covariance matrix is of comparable importance to the estimation of the power spectrum. 

(c) Following on from point (b), we must be able to demonstrate that any power spectrum estimation method produces 
negligible, or acceptable, bias on the physical parameters of interest. It is not enough simply to demonstrate that a method 
recovers the power spectrum with a bias that is small compared to the variance of the estimator, since small correlated errors 
can cause a significant bias in the estimation of physical parameters (see Efstathiou and Bond, 1999, Section 5). 

(d) A fast PCL method is much more flexible than a computationally expensive ML method. For example, it may be feasible 
to implement an ML method by exploiting symmetries appropriate to a particular scanning pattern. However, it may then 
be difficult to adapt the method to another scanning strategy. A realistic experiment will contain intentional and unforseen 
gaps in the TOD which might be difficult to handle in an ML method. In a ML method based on TOD, it may be difficult to 

§ After this paper was submitted for publication, Wandelt Larson and Lakshminarayanan (2003) have described an ingeneous metliod 
which does not fit into either of tlie two approaches described here. Their method, which is related to the method of Jewell et al. (2002), 
can recover the joint posterior distribution of the CMB power spectrum and signal directly from the TOD using Gibbs resampling . 
The method is potentially very powerful because it can account for complex scanning patterns and noise properties and furthermore 
scales as N^^"^ , though with a large prefactor. Application of this technique to Planck size data sets poses a challenging, but perhaps not 
insurmountable, computational problem. 
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deal with foreground component separation and Galactic sky cuts, or the method may impose significant restrictions on the 
algorithms used for component separation. These types of complexity can be incorporated easily into a PCL method, using 
Monte Carlo simulations if necessary. 

The point of view adopted here is not new. It has been argued before by Hivon et al. (2002) and is implicit in the analysis 
of the WMAP experiment (Bennett et al. 2003; Hinshaw et al. 2003). The purpose of this paper is to collate and derive 
various results that provide additional, and in the view of this author compelling, support for this viewpoint. In the first part 
of this paper (Sections 2-4) we consider the idealised case of a CMB experiment free of instrumental noise. PCL estimators are 
discussed in Section 2 and expressions for the covariance matrix are derived and tested against simulations. A QML estimator 
is discussed in Section 3 and is shown to be statistically equivalent to a PCL estimator at high multipoles. An unbiased hybrid 
estimator is proposed in Section 4 that combines a QML estimator at low multipoles and a PCL estimator at high multipoles. 
It is argued that this hybrid estimator, which can be evaluated on a laptop computer and returns an accurate estimate of the 
covariance matrix, is virtually indistinguishable from a full ML solution. Inclusion of instrumental noise is discussed in Section 
5 and the hybrid estimator is generalized to include more than one PCL estimate with different pixel weights. In the presence 
of instrumental noise, it is argued that a fast hybrid estimator can be defined that is close to optimal for all multipoles. This 
is demonstrated by applying a hybrid estimator to a high resolution simulation of the CMB sky (0.2° pixels), including a 
Planck-type scanning pattern. The conclusions are presented in Section 6, together with a discussion of possible limitations 
of the hybrid estimator and areas where further work is required. 



2 ESTIMATION USING PSEUDO-C^ 
2.1 The Pseudo-C^ estimator 

We first review some basic results and establish some notation. The spherical harmonic transform of a map ATj is defined as 

aim = '^^^TiWiQ.iYi>m{9i), (3) 

i 

where Qi is the area of pixel i and Wi is an arbitrary weight function with spherical transform 

Wim = ^ Wi Q.i Yijn (Oi). (4) 

i 

The aim coefficients are related to the true aem coefficients on the uncut sky by a coupling matrix K, 

(lem = ^^di'm' Klrne'm' , (5a) 
e'm' 

where 

f. -\-,r, A2^1 + l)(2^2 + l)(2£3 + l) V^' ...m,( .smj ^1 ^2 4 W ^1 ^2 4 \ 

Ke,m,e,m,- }^w,,mA ^ 1 (-1) (-1) ( Q Q oil mi -m. -ma i ' ^^^^ 

isms ^ ' \ / \ / 

(see e.g. Hivon et al. 2002). An alternative expression for K, in terms of a sum over pixels is 

Jf^imi^jms = ^Wifizy«imi(6'i)y«2m2(^<)- (6) 

i 

A PCL estimator is constructed from the sum 

m 

Since the weight function Wi is unspecified, equation (7) defines a set of PCL estimators. As is well known (Peebles 1973; 
Hivon et al. 2002), the expectation value of (7) is related to the true power spectrum Ct by a convolution 

{(7J>=^Q,M,,,, (8) 
where the coupling matrix can be expressed in terms of 3j symbols as 

2 

(9) 



^^ /o,. , (2^3 + l),j, / £i ii h\ 

M,,,2 = (2^2 + 1)^ ^^^1^,3 j 



^3 

and Wi (the 'window function') is the power spectrum of the weighting function w^m, defined in an analogous way to equation 
(7). The summation in equation (9) will appear several times in this paper, so to simplify the notation we define 
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/ 

Figure 1. The points joined by the dotted hne show the average values of the PCL estimates, from 10"'' simulations using equal 
weights per pixel outside a Galactic cut of ±10°. The points joined by the solid line show average values of the deconvolved estimates 
Cj. The solid line shows the input theoretical power spectrum and the dotted line shows this spectrum convolved with the coupling 
matrix M (equation 8). 



In addition, for some purposes it is useful to define the explicitly symmetric coupling matrix G, 

If the coupling matrix Me^e^ is invertible, unbiased (deconvolved) estimates CJ of the true power spectrum can be 

reconstructed via 

C^ = M-,}C^,. (12) 

Evidently, provided Mt^t^ is invertible, unbiased estimates of Ce can be recovered independently of the weights Wi. The only 
effect of the weights is to change the covariance matrix and hence we should seek weights that minimise the errors. 

In the analysis of CMB experiments, a region close to the Galactic plane is often excised from a map to reduce contam- 
ination of the primordial CMB signal from Galactic emission. If the Galactic cut is small enough, then the coupling matrix 
M will be invertible. As a rough rule of thumb, the matrix M will be invertible if the two-point correlation function C{0) can 
be determined on all angular scales from the data within the uncut sky (Mortlock, Challinor and Hobson 2002). In practice 
this means that quite large sky cuts (up to ~ 30° above and below the Galactic plane) can be imposed on the data before the 
matrix M becomes singular (see also Efstathiou 2004). In contrast, for a finite sky cut, the matrix K(^£^i defined in equation 
(5a) will be singular, since some linear combinations of the aem will define modes that lie within the cut sky. Obviously, these 
'null modes' cannot be recovered from the observed atm- 

Figure 1 shows an example of PCL estimation applied to simulated CMB skies. Throughout this paper we will adopt 
the CMB power spectrum of the concordance ACDM model favoured by WMAP (Spergel et al. 2003). The exact parameters 
adopted are those of the fiducial ACDM model defined in Section 2 of Efstathiou (2003). For the tests shown in Figure 1, 
10® simulations were generated with ^max = 200, a Gaussian beam with FWHM of 6s = 1.0° and a pixel size of 0.5°. The 
simulations described in this paper use an 'igloo' pixelization scheme (identical pixels on lines of constant latitude, see Figure 
7 below and Crittenden and Turok 1998) and fast spherical transforms based on software developed by the author for the 
Planck Phase A study (Bersanelli et al. 1996). The simulations are noise-free and a Galactic cut of ±10° was imposed. Unless 
otherwise stated, beam functions will not be written explicitly in equations, thus will sometimes mean C^fef . Since the 



2 
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symmetric beams are used in all of the simulations in this paper including beams simply involves multiplying theoretical 
power spectra, and dividing estimated power spectra, by the beam function tif. In all of the figures in this paper, the power 
spectra are divided by Tq , so that they are dimensionless, where To = 2.275 is the mean CMB temperature (Mather et al. 
1999). 

The points connected by the dotted line in Figure 1 show the average, over 10* simulations, of the PCL estimates 

computed from equation (7) using equal weight per pixel in the region outside the Galactic cut. The points joined by the solid 
hne show the average of the deconvolved estimates Cg (equation 12). Neither of these estimates has been corrected for the 
Gaussian beam. These results are shown to demonstrate that it is possible to run large numbers of simulations of the PCL 
method without detecting any significant bias. Issues related to the finite pixel sizes, and the evaluation of discrete spherical 
transforms (Crittenden and Turok 1998; Gorski et al. 1999; Doroshkevich et al. 2004), can be kept under control by choosing 
pixels that are small enough and so will not be discussed further in this paper. 

We end this sub-section with some remarks concerning PCL estimation when the coupling matrix M is singular. If the 
power spectrum is to be used to estimate physical parameters, then clearly there is no need to construct a deconvolved PCL 
estimate from equation (12). The PCL estimate (7) and its covariance matrix (Section 2.2) can be used to construct a likelihood 
function without any loss of information. If the matrix M is singular, then any attempt to compute an approximation to the 
deconvolved estimate will entail loss of information and will provide a poorer approximation to the likelihood function than 
the convolved estimates CJ. Nevertheless, there may be 'cosmetic' reasons for wanting an approximation to Cg, for example, 
to compare different experiments on the same plot. In this case, one can solve for band-power averages of £{l + l)Cf over 
ranges in i comparable to the width of the window function Wi (see e.g. Bond et al. 1998). Alternatively, one can introduce 
an invertible regularising matrix R and compute 

RCJ = {R-^M)-'^C^ (13) 

for each experiment. If R is chosen to have a width comparable to the window functions We of the two experiments, the 
product R~^M should be invertible. 



2.2 Covariance matrix of the pseudo-C< estimator 

Starting from the defininition of the PCL estimator (equations 3 and 7) it is straightforward to show that the covariance 
matrix is given by 

(AC^AC^,) = (2^ + 1)(2^' 4- 1) 5Z ^ ^ C!e^Ce2Kemeim,iKe'^,e^rniKeme2m2Ke'm'e2m2- (14) 

mm' timi ^2^^2 

As it stands, equation (14) is not useful, but it can be simplified for high multipoles (greater than the width of the window 
function We). If the Galactic cut is narrow, then we can replace Cei and C^j with Ce and Ce' and then apply the completeness 
relation for spherical harmonics {e.g. Varshalovich, Moskalev and Khersonskii 1988). This gives 

(AC^AC^,) = Vee' « 2CeCe'E{£,£' , W^^^), (15a) 
where is the power spectrum of the square of the weight function Wi, 

= (^^I^^Sl'' = Y,^\^^^im{Bi). (15b) 

i 

If Wi is a simple mask with values of 1 or 0, then w^ =Wi, Wj; = We and so the covariance matrix can be written in terms 
of the symmetric coupling matrix Geie2 defined in equation (11) 

(AC^AC^g,) ^ 2CeCe>Gee,. (16) 
The covariance matrix of the deconvolved PCL estimates (equation 12) is therefore 

(ACf ACJ,) = M-^ViM-Y- (17) 

From this equation, and using the general form of V for arbitrary weights, we can differentiate the variance {AC^ AC^) with 
respect to Wi. After some algebra, we can prove that for high multipoles, the variance of in a noise- free experiment is 
minimised if we adopt equal weight per pixel. The estimates shown in Figure 1 are therefore the minimum variance PCL 
estimates for a noise-free experiment. It is possible to define a class of estimators that use a weight function that maximises the 
resolution of the power spectrum estimates from an incomplete sky (Tegmark 1996a, b). However, these require the inversion 
of Nd X Nd matrices and will not bo discussed further here. 

Figure 2 compares the covariance matrices determined from the simulations described in Section 2.1 (left hand panels) 
with the analytic expressions of equations (15a) and (17). For a Galactic cut of ±10°, the analytic approximations are accurate 
at £ ^ 20, where the covariance matrix is very nearly band-diagonal, but fail at lower multipoles. At £ ^ 20, even the diagonal 
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Figure 2. The upper panels, (a) and (b), show the covariance matrices for the PCL estimates and the lower panels, (c) and (d), 
show the covariance matrices for the deconvolved PCL estimates Cj. Plots to the left show the covariance matrices averaged over the 
10^ simulations described in Section 2.1; plots to the right show the show the analytic approximations of equations (16) and (17). The 
covariance matrices have been multiplied by £^^2 ^° that the diagonal components have roughly equal amplitude and to amplify the 
off-diagonal elements. 



components of the analytic covariance matrices differ (by up to 25% sX 1 = 2) from the covariance matrices determined from 
the simulations (see Figure 12 of Section 5.4). The off-diagonal components differ by much larger factors and so the analytic 
approximations at low multipoles are useless for any quantitative application such as parameter estimation. 

This can be seen in Figure 3, which shows a similar comparison to that shown in Figure 2, but using 2 x 10^ low resolution 
simulations with pixel size 9c = 5° and Gaussian beam FWHM of 7°. However, for such a low resolution map, it is possible 
to evaluate an exact expression for the covariance matrix. This can be done conveniently by computing 

mm' 

where Birni'm' is evaluated by summing over pixels 
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Figure 3. As Figure 2, but for 2 X 10® low resolution simulations with pixel size 6^ = 5° and beam smoothing of 7°. The differences 
between the covariance matrices estimated from the simulations and the analytic expression (equation 15a) are now easily seen. 



Bimt'^' = ^w,wjQ.^njC{e,,)Yi^{ei)Y;,^,{ej). (19) 

In equation (19), C{9) is the temperature autocorrelation function 

C{e,,) = (AT, AT,) = -3- V(2^ + l)C^P£(cos6i). (20) 

e 

A comparison of the simulations with the exact expression (18) is shown in Figure 4. As expected, the agreement is almost 
perfect and any residual differences are simply caused by noise from the finite number of simulations. 

The procedure for estimating an accurate covariance matrix for a PCL estimator is therefore as follows: 

(i) Adopt a (smooth) theoretical model for the power spectrum Ce. (This can be done iteratively, with an intermediate 
parameter estimation step if required). 

(ii) Compute the covariance matrix for PCL estimates from the full resolution map using the approximate expression (15a). 
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Figure 4. As Figure 3, but now comparing with the exact expression for the covariance matrices (equation 18). 



(iii) Use the exact expression (18) to compute the covariance matrix at low multipoles by summation using a lower resolution 
pixelization. 

(iv) Replace the elements of the covariance matrices at low multipoles (£i < Ij, £2 < ^.7) computed in step (ii) with the 
elements computed in step (iii). 

There are a number of subtleties involved in this procedure. For example, what resolution should be used in step (iii)? 
How does one choose the 'joining' multipole £j in step (iv)? What should one use for the final covariance matrix in the ranges 
(£1 < £j, £2 > £j) and {£2 < ij, ii > £j) which are not computed in step (iii)? In the example discussed in Section (4), the 
joining multipole £j is simply chosen by trial and error so that the covariances matrices computed in steps (ii) and (iv) agree 
to high accuracy (a few percent or better). The components of the covariance matrices at {£1 < £j, £2 > £j) and (£2 < £j, 
£1 > £j) are simply retained, but their values are so low that for most purposes it would make no difference if these values 
were set to zero. None of these subtleties are critical, and the method is so fast that many choices can be explored to assess 
the accuracy of the method. As long as the sky cut is not too large, the method described here can return a deconvolved 
PCL estimate CJ, together with an accurate covariance matrix. If the sky cut is large, then the method can return a PCL 
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estimate (or 'regularised' estimate, equation (13)) together with an accurate covariance matrix. Including uncorrelated 
instrumental noise poses no fundamental problems, and is discussed in Section 5. 



3 MAXIMUM LIKELIHOOD ESTIMATION 
3.1 NRML Estimators 

In the NRML method (Bond et al, 1998) a set of parameters Op, which may be estimates of the power spectrum Ce, is 
determined from the likelihood function (1) by Newton-Raphson iteration. First, guess a set of values for the parameters Oj, 
and then refine these by adding a correction 



(21a) 



where F (the Fisher matrix) is the expectation value of the curvature matrix 

(21b) 



dapdttpi / 2 



^-lac^-i dc 

dan da^i 



This leads to a new set of values Op and the computation can be repeated until the estimates Up converge. An estimate of the 
covariance matrix for the parameters Op is given by the inverse of the Fisher matrix at the final iteration 

{6ap5ap,)=F;J. (22) 

An alternative, and computationally more expensive NRML method uses the full curvature matrix instead of F in equation 
(21a) (Borrill 1999b). The NRML estimator has been used widely in the analysis of CMB experiments, for example, MAXIMA- 
1 (Hanany et al, 2000), BOOMERANG (de Bernardis et al, 2000), WMAP (Hinshaw et al, 2003) and to galaxy clustering 
(Efstathiou and Moody 2001). 

There are, however, some disadvantages in using the NRML method. If the method is to be used to estimate power 
spectra from a map with a large sky cut, the Fisher matrix F((i will be numerically singular (i.e. will have a high condition 
number, see Press et al., 1992). In this case, the iterations (21a) will not converge. This problem can be overcome by solving 
for band-power averages of the Ci (Bond et al., 1998). However, band-power averaging involves a loss of information and can 
produce biases in the final estimates of the power spectra^, depending on the choice of bands. Cosmological parameters are 
often estimated from the power spectrum estimates, C|, by minimising a x^-i 

X'(?p) = Y^iCTiqp) - Ct)Fu'{Cj,{qp) - a,), (23) 

with respect to the parameters qp, where Cj is the theoretical power spectrum. If the Fisher matrix (21b) is used in equation 
(23) then the parameters qp will be biased (Bond et al. 1998; Oh et al. 1999). This is because a low estimated value of C| will 
be assigned a low variance. This problem can be partially alleviated by smoothing the final estimates C| and rc-computing 
the Fisher matrix (Oh et al. 1999). However, it is easy to think of cases where such a procedure would fail catastrophically, 
for example, for the low quadrupole amplitude measured by WMAP (Bennett et al. 2003). To avoid bias, what is wanted 
in equation (23) is the Fisher matrix appropriate to the estimator for the theoretical model of interest, i.e. Fiii{Cf), and 
preferably an accurate model of the likelihood function rather than the simple of equation (23). Clearly, it is not feasible to 
evaluate equation (21b) for every set of physical parameters qp. One solution is to find a model for F((i which can be rescaled 
(or recallibrated) to a good approximation for any given theoretical model (sec e.g. Verde et al. 2003). This approach is, in 
part, motivation for the hybrid estimator developed in Sections 4 and 5. For another approach to this problem, see Gupta 
and Heavens 2002). 

Finally, since the NRML method iterates to a ML solution, the final estimates of C| arc not easily expressible in terms of 
the data vector x. In contrast, the QML estimator discussed in the next Section, as its name implies, is expressible in terms 
of products XiXj of the data vector. This property is critical in defining the hybrid estimator of Sections 4 and 5. 

3.2 QML Estimators 

QML estimators have been used extensively by Tegmark and collaborators (e.g. Tegmark et al. 1998; Hamilton, Tcgmark and 
Padmanabhan, 2000; Tegmark et al. 2002a; Tegmark, Hamilton and Xu, 2002b). Most other authors have used the NRML 

^ Even if there is no band-averaging, the final estimates of Ce will be biased since the method iterates to a maximum Ukelihood solution 
that is only assymptotically unbiased (Bond et al, 1998). 
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methods described in the previous sub-section, giving the impression that QML estimators are in some sense inferior to NRML 
estimators. One of the purposes of this sub-section is to show that this is impression is unjustified. As in previous Sections, 
instrumental noise is ignored to simphfy the discussion. Instrumental noise is discussed in Section 5.2. 

If the data vector Xi is composed of Gaussian random variates, a minimum variance quadratic estimator of the power 
spectrum is 

ye=XiXjEfj, (24a) 
where 

(Tegmark 1997). Evidently, a functional form for Ce (Q°) must be assumed to compute the matrix (24b). However, the 
expectation value of ye provides an unbiased estimator of the true Ce whatever the assumed form for CJ°, 

{ye)=Fu'Ce', (25) 

where F^e' is the Fisher matrix of equation (21b) with Op replaced by Ce- As it stands, the vector ye will (usually) look very 
different to the power spectrum. However, we can recover an estimate that has a similar shape to the true power spectrum 

simply by rcscaling 

The covariance matrix for the estimates ye is 

(27) 



{yeye')-{ye}{ye'} = 2Tr CE'CE'' 



where the C"s appearing in equation (27) are the true covariance matrices. If the assumed form of Ce is a good approximation 
to the true power spectrum, then equation (27) simplifies to 

{Aye Aye,) = Fee'. (28) 

To summarise, if the initial guess is a poor approximation to the true Ce, equation (25) provides an unbiased estimate of 
the power spectrum provided that the guess for Ce is used to compute the covariance matrices in the expressions for E and 
F (equations (24b) and (21b)). If the initial guess for Ce is poor, the resulting estimates ye, although providing unbiased 
estimates of Ce, will not be the ML solution and their covariances will be given by equation (27); the errors on the QML 
estimates, although accurate, will not be as small as those of a ML solution. If the initial guess for Ce is close to the true answer, 
then the estimates of ye will be close to the ML solution and their covariances will be given accurately by equation (28). In 
practice, finding an acceptable initial guess for Ce is unlikely to be a problem since the QML estimates at low multipoles are 
almost independent of the initial guess (see Section 3.3) and the CMB power spectrum at intermediate multipoles is already 
well estimated from WMAP. Henceforth we will assume that the initial guess for Ce is close enough to the true answer that 
equation (28) applies. 

If the Fisher matrix is non-singular (as is the case for a narrow Galactic cut) then equation (25) can be inverted to provide 
an unbiased QML estimate of Ce, 

Cl = F-}ye, (29) 
with covariance matrix 

{ACIACD = F-,}. (30) 

The matrix F is therefore an approximation to the Fisher matrix for the estimates C^. 

With this formulation, there is a close analogy between the PCL estimates C^ and (equations (8) and (12)) and the 
QML estimates CJ and Cl (eqations (26) and (29)). Both of the estimates C^ and C^ are related to the true power spectrum 
by convolutions. If a narrow Galactic cut is used, then these estimates can be deconvolved to form C^ and CJ. However, this 
deconvolution step is not essential, since CJ and CJ contain the same information as the deconvolved estimates C^ and C^. 



3.3 Fisher matrix for low multipoles 

Suppose that the data vector Xi consists of the spherical harmonic coefficients aern measured on the cut sky. These coefficients 
are related to the true harmonic coefficients by the coupling matrix Keme'm' (equation 5a). As discussed in Section 2.1, on 
a cut sky some combinations of the aem define 'null modes' that lie within the cut and hence the matrix K will be singular. 
However, if the Galactic cut is relatively narrow, the coefficients aem at low multipoles will be weakly correlated with any of 



© 0000 RAS, MNRAS 000, 000-000 




I, I, 

Figure 5. The figure to the left shows the covariance matrix for the quadratic estimator (equation 29) computed from 2 X 10^ 
simulations. The figure to the right shows the inverse of the Fisher matrix (equation 30). 



the airn that define 'null modes' (Figure 2). In this case, the expansion (5a) can be truncated at finite {£,m) and {£' ,m'), and 
the truncated matrix K for the cut sky can be inverted. (The invertibility of K can be used to construct orthogonal functions 
on the cut sky from the spherical harmonics Yim, Gorski 1994, Mortlock et al. 2002). 

If K is invertible, then we can reconstruct the approximations to the actual airn coefficients on the uncut sky by performing 
the matrix inversion a « K~^a. The QML estimates of Ci (in the absence of instrumental noise) then reduce to 

m 

and so return almost the exact value of Ci for our particular realization of the CMB sky in the presence of a Galactic cut (see 
also Efstathiou 2004). The Fisher matrix for the QML estimates is therefore simply 

(2^+1) 

reflecting cosmic variance. In other words, the QML estimates give cosmic variance limited estimates of the power spectrum 
independent of the size of a Galactic cut, provided that the cut is not too large. (For example, equation (32) is a good 
approximation at low multipoles for the Kp2 sky cut defined by the WMAP team but begins to break down for the KpO and 
larger sky cuts, see Efstathiou 2004). The usual formula 

{e.g. Knox 1995) where /sky is the fraction of the sky sampled by the experiment, clearly does not apply at low multipoles 
and one can think of the /sky factor in (33) as reflecting the loss of information at high multipoles from the missing 'null 
modes' that lie within the Galactic cut. 

The results of this subsection are borne-out by a set of 2 x 10^ numerical simulations. In these simulations we apply 
the quadratic estimator (24a) to the same set of simulations used to construct Figures 3 and 4, using the pixel ATi values 
as the data vector Xi. Figure 5 shows the resulting covariance matrix for the estimates Cl compared to the inverse of the 
Fisher matrix. There is some low-amplitude off-diagonal structure in both of the plots (as a consequence of the small coupling 
with 'null modes') but this is small and negligible for most purposes. The results from the simulations agree with equation 
(32) to within a few percent at multipoles 1^5) and rise smoothly to match approximately with equation (33) by £ = 30 
(see also Figure 14a below). Note that for the simulations used here, /sky = 0.83. The discrepancy between the analytic and 
simulation results at low I in Figure 5 is simply a consequence of the flnite number of simulations. The off-diagonal elements 
of the covariance matrix estimated from the simulations at low multipoles are small compared to the diagonal elements (note 
that the grey-scale in Figures 2-5 was chosen to accentuate small amplitude structure) and become smaller as the number of 
simulations is increased. 

The behaviour of the QML estimator differs from the PCL estimates CJ, which are highly correlated at low multipoles 
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(ATg)^ input (jjXf (ATg)^ input (jjXf 

Figure 6. The figure to the left shows the quadrupole amphtude estimated from the PCL estimator (ordinate) plotted against the 
quadrupolc amplitude of the simulation used to generate the map (abscissa). (Points for 1000 simulations are shown). The figure to the 
right shows the equivalent plot for the QML estimator. The error bar in Figure (6a) shows the cosmic variance error for the mean value 
of (AT2)2 = 1123 (/iK)2 for the input simulations. 



{e.g. Figure 4) and which have a variance that significantly exceeds cosmic variance. As an example, Figure 6 compares the 
amplitudes of the quadrupole estimates, expressed as 

{ATef = -^e{e + l)Ce, (34) 

compared to the input quadrupole amplitude in each simulation. The expectation value of the quadrupole amplitude for the 

simulations is (AT2)^ — 1123 (/iK)^ (after smoothing with the Gaussian beam function) and so the cosmic variance is expected 
to lead to a dispersion of 710 (/iK)^. Figure 6, shows that the 'estimator induced' dispersion in the PCL case (262 (/xK)^) is a 
significaiit fraction of the cosmic variance . For the QML case, the 'estimator induced' dispersion is much smaller (68 (/uK)^). 
These numbers depend on the size of the Galactic cut, but it is clear that for realistic Galactic cuts the QML estimator returns 
amplitudes for low multipoles that are close to the true values for our particular realisation in the sky. This can be important, 
e.g. the WMAP quadrupole estimated using a PCL estimator has an amplitude of only (AT2)^ = 123 (/uK)^, much smaller 
than the amplitude expected in the concordance ACDM cosmology (Bennett et al. 2003; Spergel et al. 2003). In contrast, 
the QML estimator applied to the WMAP internal linear combination map gives a quadrupole amplitude closer to 200 (/iK)^ 
(Efstathiou 2004). 



3.4 Fisher matrix at high multipoles 

In the pixel domain, the signal covariance matrix is 
{21 + 1) 



Sij = ^^^WiWjCePe{eij) = ^ CeWiWjYem{ei)YUOj), (35) 

where Wi is the (arbitrary) pixel weight function introduced in equation (3). If the data cover the whole sky (and the weight 
function Wi is everywhere non-zero), then from the orthogonality of the spherical harmonics, the inverse signal covariance 

matrix is given by 

= E 7^^Yem{ei)YUej). (36) 
em 

Although equation (36) applies strictly only for the complete sky, it will be a good approximation for a cut sky with i and 
j restricted to the pixels outside the cut, provided that the total number of pixels in the map is very much greater than the 
number of pixels close to the boundary. (This condition is satisfied in all of the examples discussed in this paper). The product 
C-^dC/dCe in (24b) is then 
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« a— y«m(ei)F;^(6'fc). (37) 

m 

The Fisher matrix (21b) is therefore 

Fee' « ^^^^Ye^iWrr.{Ok)Y,,^,iek)Y;,^,{e,). (38) 

ik mm' 

Notice that weight factors Wi have cancelled and do not appear in equation (38) . This is what is expected; if the data vector 
Xi is multiplied by arbitrary weights and a maximum likelihood estimator applied to estimate C'e, then neither the estimates 
nor their covariance matrix should depend on the weights. The derivation leading to equation (38) differs from that in 
Appendix D.1.2 of Hinshaw et al, (2003) which assumes equal weight per pixel. The final answer does, however, agree with 
equation (D19) of Hinshaw et al. (2003). 

Evaluating the summations in equation (38), we find 

(2£+l)(2/ + l) (2^ + 1) ,. 

where the matrices Gie' and Mee are the coupling matrices defined by equations (9) and (11) evaluated for unit weight per 
pixel over the uncut sky. 

Notice that for high multipoles, equation (17) for the covariance matrix of the deconvolved PCL estimates is equal to 

{^Cl^C-) « (40) 

which is equal to the inverse of equation (39). It therefore follows that: (i) in the signal dominated limit and (ii) for high 
multipoles {I greater than the width of the window function We), the deconvolved PCL estimate with equal weight per pixel is 
optimal and statistically equivalent to a maximum-likelihood solution. In the signal-dominated limit, there is therefore nothing 
to be gained by solving the 0{N^) computational problem to apply a ML method, since the much faster PCL estimate will 
give a result that is statistically equivalent. 



4 HYBRID ESTIMATOR 

Having established that the PCL estimator is very close to optimal at high multipoles, it is clear that a fast and almost 
optimal power spectrum estimate can be derived by combining a PCL estimate with a QML estimate applied to a degraded 
low resolution map. As an example, consider the maps shown in Figure 7. The map in the upper panel is one of the simulations 
used for the tests described in Section 2.1 (pixel size 1 1 = 0.5°, = 1°). The map in the lower panel was constructed from 
the same a^,„ coefficients with pixel size 6^ = 5° and smoothed to 6^ = 5°. Figure 7 plots the maps over the whole sky, but as 
in previous Sections a cut of ±10° above and below the notional Galactic plane has been applied in the tests described below. 
There are 1385 active pixels outside Galactic cut in the low resolution map and so it takes only a few seconds to compute all 
of the 0{N^) operations required for the QML estimator. 

From these maps, two estimates of the power spectrum can be formed, (up to a maximum multipole i = ip) and C*^ 
(up to maximum multipole £ = £q), together with accurate estimates of their covariance matrices, as discussed in previous 
Sections. The cross-correlation between the QML and PCL estimates at low multipoles is given by 

(ACI'Aq,) « j^ffj-^Me,, {AC^ACJ,) « j^ffj-)^'''' ^^^^ 

where we have assumed the approximate expression for the QML estimator of equation (31). Equation (41) has been used in 
the analysis described below and is a perfectly adequate approximation for i 30. If necessary, a more accurate expression 
for the cross-correlations can be evaluated from a sum over pixels, as in the analysis of the PCL covariance matrix described 
in Section 2.2: 

{AC^Aye) = ^vimz\i^Elq, (42a) 

mpq 

where 

Ziim = Yj C{Oij)wiQiYem{Oi). (42b) 



The pixel areas are identical on lines of constant latitude, but vary slightly with latitude. 
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Figure 7. The upper figure shows a simulated CMB sky with smoothing 8s = 1° pixelized into 0.5° X 0.5° pixels using an 'igloo' type 
pixelization scheme (164828 pixels in total). The lower figure shows the same map smoothed to 6s = 5° and pixelized into 1632 pixels of 
area 5° X 5°. 



Given expressions for the covariances and the cross-covariances, we can combine the two estimates and into a 
single data vector Cat {a = q,p) and form the 



X = {Call — Cei)^aeii3e2{Cf3i2 — Ci^), 



(43) 



where J-aiii3i2 is the inverse of the covariance matrix {^^Caix^Cpi^) ■ Minimising equation (43) gives the solution for the 
hybrid estimate Cg 



with covariance matrix 



(44) 



\ a0 / 



(45) 



The hybrid estimator Cg therefore makes a smooth transition between the QML estimates at low multipoles and the PCL 
estimates at high multipoles (and would be exactly equal to the quadratic estimates at low multipoles if the Fisher matrix 
were precisely diagonal). From the results presented in Sections 2.2 and 3.2, the covariance matrix for the hybrid estimates 
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Figure 8. The filled circles show the results of applying the QML estimator to the low resolution map shown in Figure 7. The error 
bars are computed from the diagonal components of the inverse of the Fisher matrix (equation 21b). Three rows of the Fisher matrix 
are shown in the panels to the right. The stars show the deconvolved PCL estimates determined from the high resolution map shown 
in Figure 7. The dotted line shows the fiducial ACDM power spectrum and the solid line shows the actual power spectrum computed 

from the a^,„ coefficients used to generate the maps. A sky cut of ±10° has been used for the estimated power spectra and the QML 
estimates have been corrected to a beam smoothing of 1° as used in the high resolution maps. 



(45) is expected to be prcdominently band-diagonal at all multipoles if the sky cut is relatively narrow. Furthermore, since 
each of the estimates and is an unbiased estimate of Ce, it follows from equation (44) that the hybrid estimator is 
unbiased. 

To illustrate the hybrid method, Figure 8 shows a comparison of the QML and PCL estimates at low multipoles derived 
from the maps shown in Figure 7. Three rows of the Fisher matrix for the QML estimates are shown in the panels to the right. 
As expected, for a sky cut of ±10°, the QML Fisher matrix is almost exactly diagonal at low multipoles and so the estimates 
agree almost perfectly with the power spectrum computed from the atrn for this particular simulation. By multipoles 
i ~ 10 the QML and PCL estimates become similar. 

The hybrid estimate computed from equation (44) are shown in Figure 9. In this example, the QML estimates were 
retained up to £q = 20. For the PCL estimator, the exact expression for the covariancc matrix (equation 18) was used for 
£ <20 and the approximate expression (equation 15a) was used for higher multipoles. Neither of these choices are critical, as 
can be seen from the close agreement between QML and PCL estimates plotted in Figure 8. 

The analytic covariance matrices for the PCL estimator (c/ Figure 2d) and the hybrid estimator (equation 45) are plotted 
in Figure 10 for multipoles i < 50. As expected, the covariance matrix for the hybrid estimator is almost band-diagonal. In 
Section 3.4 it was proved that the PCL estimator with equal weight per pixel is statistically equivalent to the QML estimator 
if instrumental noise can be neglected. The hybrid estimates plotted in Figure 9 should therefore be almost indistinguishable 
from a brute force 0{N^) ML analysis of the high resolution map shown in Figure 7. However, in contrast to the 0{N^) 
methods, computation of the hybrid estimator and its covariance matrix takes of order seconds on a laptop computer. 

The analysis of the hybrid estimator, as described above, applies if the estimators Cf and C*^ can be constructed from 
the estimates and C', i.e. if the Galactic cut is narrow enough. For a large Galactic cut, it may not be possible to 
contract and Cf. In this case, one cam simply use the hybrid of equation (43) to estimate cosmological parameters (or 
a suitably modified function to account for the deviations from Gaussianity at low multipoles) in place of the of equation 
(23). Or, equivalently, one can solve for a suitably regularized hybrid power spectrum, RCe, (equation 13) and its covariance 
matrix. There is no requirement to construct a deconvolved hybrid estimate C( and no loss of information in working with a 
regularised estimate. 
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Figure 9. The filled circles show the hybrid estimates for the maps shown in Figure 7 computed from equation (44). The error bars 
were computed from diagonal components of the covariance matrix (45). As in Figure 8, the dotted line shows the fiducial ACDM power 
spectrum and the solid line shows the actual power spectrum computed from the coefficients used to generate the maps. 



(a) pseudo estimator (b) hybrid estimator 




20 40 20 40 



I, I, 

Figure 10. The loft hand panel shows the analytic covariance matrix at ^ < 50 for the PCL estimates. The right hand panel shows the 
covariance matrix for the hybrid estimates computed from equation (45). 

5 INCLUDING NOISE 

Including uncorrelated noise poses no fundamental problems. In the first two subsections some of the formulae of the previous 
Sections are generalised, without detailed mathematical proof, to include uncorrelated noise. The motivation for applying a 
hybrid estimator to noisy data is given in Section 5.3, and an application to a high resolution simulation with a Planck-type 
scanning pattern is described in Section 5.4. For simplicity, we consider only the case of white noise in this Section, neglecting 
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the l//-type noise that is expected in a real experiment. Possible ways of handling 1/f noise are discussed briefly in Section 
6. 

5.1 PCL estimator including noise 

If noise is included, the PCL estimator (7) has expectation value 

{C^) = Y,Ce'Me(' + m, (46) 
{e.g. Hivon et al. 2002) where the noise power spectrum is, in general, 

iV^ = ^ ^ NijWiWj fiiOj Pt {Oij ) . (47) 

ij 

If the noise matrix is diagonal, Nij = afSij , equation (47) simplifies to 



An 



To form an unbiased PCL estimator of the power spectrum, then we simply subtract the noise term, C'/ = C'^ — Ne, and 
compute the deconvolved estimates = M'^^jC'^ . 
The covariance matrix of C'[ is 

(ACf AC;f ) = VU, = Vw + (ACf ACiy ) + Uw , (49) 

where Vu' is given by equation (15a). The noise covariance in equation (49) is 

(AC-f ACiy ) = 2H(£, /, W), (50a) 
where 

= f^2£ \ 1) w^m = "^crfwiQiYimiei), (50b) 

m i 

and the cross-covariancc Uw is given by 

Uw ^A{CiCef'^-E{i,i',W^''), (51a) 
where 

"^''^ = (2?TI) E )• (^1^) 

m 

The covariance matrix of the deconvolved estimates including noise is 

(AC'I'AC'J,) = M-^V'[M-^f, (52) 

with M as given in equation (9). All of the results quoted in this Section have been derived under the assumption that the 
window functions are narrow compared to variations in d, and the factor (C^C^/)^^'^ in equation (51a) has been written in 
this way to preserve the symmetry of the matrix V' . An equivalent result to equation (49) has been derived independently 
by Chon et al. (2003). 

5.2 QML estimator including noise 

For a general noise matrix Nij, we can define an unbiased QML estimate by redefining yi equation (24a) as 

y'l = XiXjElj - NijEij, (53) 

(Tegmark 1997). The expectation value of is then given by equation (25) with the Fisher matrix as defined in equation 
(21b) where C — S + N . The covariance matrix of the estimates y'^ arc given by equation (27). 
If the noise matrix is diagonal and dominates over the signal, Cij « Nij = afSij, then 

Ef,»^^^J^Pe{9i,), (54) 

* 3 

and the QML estimator is 

2ye = 2xiXjEij ^Yem{9i)YeUej) = {21 + l)Cl (55) 
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where is the PCL estimate using a weight function Wi = l/{afil,i). In other words, if the noise dominates and is diagonal, 
the QML estimator is mathematically equivalent to a PCL estimate computed with inverse variance weighting. In this limit, 
the expectation value of the QML estimates y'^ is zero and their covariance matrix is 

/A ' A ' \ I? (2£l + l)(2^2 . ,J,crx s 



where 
W[ = 



{21- 



w^m = Y.\Yemiei). (56b) 



Equation (56a) agrees with equation (D12) of Hinshaw et al. (2003). The covariance matrix of the QML estimates = F^^^y'^, 
is given by as in equation (30) and is identical to the covariance matrix of equation (52) if the weight function is set to 
Wi = l/{aiQi). If the noise is the same in each pixel and each pixel has the same area ( cr? = ap, Cli = fip), then the diagonal 
components of equation (56a) give the usual approximate formula {e.g. Knox 1995) 



5.3 Motivation for a hybrid estimator 

The results presented above and in Sections 2 and 3 show that: 

(I) in the signal dominated limit a PCL estimate with equal weight per pixel, Wi = 1, is statistically equivalent to a ML 
estimate. 

(II) in the noise dominated limit, and assuming that the noise matrix is diagonal, a PCL estimate with inverse variance 
weighting, Wi = l/{afCli), is mathematically equivalent to a ML estimate. 

For a realistic experiment such as WMAP, regime (I) is a good approximation at low multipoles and regime (II) is a 
good approximation at high multipoles, but in the intermediate regime it is difficult to derive a simple analytic formula for 
the optimal PCL weights. To do so, we would need an approximation to the inverse = {S + iV)^^, which appears in 
the Fisher matrix (equation 21b). In the harmonic domain S is sparse while A'^ is not, but in the pixel domain N is usually 
sparse while 5 in not. Hence when 5 and N are comparable, the matrix C is not sparse in either domain and so it is difficult 
to derive analytic approximations to C~^. The non-sparseness of C is the fundamental reason why it is difficult to solve the 
0{N^) problem in the numerical computation of ML estimators. 

One solution, adopted by the WMAP team (Hinshaw et al. 2003), is to use a heuristic weighting in the intermediate 
regime. In the analysis of the WMAP data, Hinshaw et al. (2003) adopt equal weight per pixel at £ < 200, inverse variance 
weighting, Wi = NohB{i) (where Nobs is the number of observations, or hit count, of pixel i) aX £ > 450, and a heuristic 
weighting of 

in the intermediate range 200 < I < 450, where (Nobs) is the average hit count over the unmasked sky. This is a reasonable 
solution, but has the disadvantage that the final power spectrum and associated covariance matrix are contracted from three 
estimators with discontinuous joins at £ = 200 and i = 450. It is worth mentioning here that if the noise matrix is assumed 
to be diagonal, the only benefit of solving the full 0{N^) ML problem at high multipoles is to derive a continuous power 
spectrum estimate that is optimal in the intermediate regime. 

The hybrid estimator discussed in Section 4 offers an alternative, but much faster, way of deriving a continuous power 
spectrum estimate between regimes (I) and (II), together with an accurate covariance matrix. Following Section 4, compute 
two PCL estimate Cg^'^ , and C^^^ , where the superscripts denote two different weight functions. The two PCL estimates 
can be combined with QML estimates at low multipoles to form a single data vector CaC, where a = q,pwi,pw2. Equations 
(44) and (45) can then be solved to give a hybrid estimate and its associated covariance matrix. This requires the cross- 
covariance of the two PCL estimates, which is straightforward to derive following the analysis of Sections 2.2 and 5.1; simply 
replace the terms in equations (15b), (50b) and (51b) by the product {wi)i{w2)i. 

If the weight function {wi)i is chosen to be equal weight per pixel, and weight function {w2)i is chosen to be {w2)i — 
l/{affli), (or a suitably regularized weighting, see Section 5.4) then the hybrid estimate C( will return answers that are a 
close approximation to the ML solution over the full range of multipoles. This method can be applied to any number of PCL 
estimates, for example, one could include a third PCL estimate with a heuristic intermediate weighting as in equation (58). In 
the examples described in Section 5.4 only two PCL estimates (equal weighting and a regularized inverse variance weighting) 
are used even though the the results would improve slightly by including an intermediate weighting scheme. 

Our final procedure for estimating a power spectrum is therefore as follows: 
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Figure 11. Map showing the hit count distribution for a Planck-hke scanning strategy in Galactic coordinates (see text for details). 
Darker regions close to the ecliptic poles have a higher hit count. 

(i) Compute a QML estimate from a low resolution map at low multipoles. If necessary, this estimate can include a general 
noise matrix Nij. 

(ii) Compute PCL estimates from a high resolution map using a number of weighting schemes, e.g. equal weight per pixel and 
inverse variance weighting. 

(iii) Combine the estimates from steps (i) and (ii) together with their covariances and cross-covariances and solve for the 
hybrid estimate and its covariance matrix as described in Section 4. 

5.4 Numerical examples 

To illustrate the hybrid estimator in the presence of instrumental noise, we have simulated the hit count distribution for a 
Planck-type scanning strategy (Figure 11). In this example, the spin-axis of the spacecraft is aligned along the ecliptic plane, 
but a slow precession of S = 5° sin(20e) is applied as the spacecraft scans in ecliptic longitude <j!>e. Figure 11 was constructed 
for a single detector, pointing at an angle of 85° with respect to the spin-axis, and for a uniform scan rate in 0e. With this 
scanning geometry, the entire sky is scanned as <^e varies from to 2tx and the regions with high hit-count are concentrated 
at the ecliptic poles as shown in Figure 11. If the mean hit-count (Nahs) is normalised to unity, the hit-count distribution in 
Figure 11 varies from a minimum of 0.54 to a maximum of 120. 

Figure 12 compares the diagonal components of the covariance and cross-covariance matrices for deconvolved PCL 
estimates, C^, for a set of 5 x 10* simulations against the analytic formulae given in Section 5.1. The simulations have 
the same parameters as the high resolution map shown in Figure 7 {6c = 0.5°, 6s — 1°) and the noise level was normalized 
to cTp = 1 X 10~* for pixels with the mean hit count. As in previous Sections a Galactic cut of ±10° was imposed. The filled 
circles in Figure 12 shows the square roots of the diagonal components of the covariance and cross-covariance matrices divided 
by the approximate formula (Knox 1995) 



As expected from the analysis of Section 5.1, inverse noise weighting leads to smaller errors than equal weighting at £ ^ 100 
when the noise dominates the error budget. However, inverse variance weighting leads to much poorer estimates of the power 
spectrum at ^ ^ 100. This is caused by the cusp-like structure of the hit count distribution shown in Figure 11. For this 
scanning pattern, the small number of pixels with a high hit counts close to the ecliptic poles dominate the PCL estimates 
with inverse variance weighting, consequently the errors on the PCL estimates at low multipoles are much larger than expected 
from cosmic variance. This scanning strategy was chosen intentionally to illustrate the differences between inverse variance 
weights and equal weights. The differences between these weighting schemes would be smaller for a WMAP-like scanning 
strategy, which produces a smooth variation of the hit counts with ecliptic latitude. 

Another consequence of the cuspiness of the hit count distribution of Figure 11 is that the window function W^^^ (equation 
(15b) for inverse variance weighting is broader the window function for equal weights. For inverse variance weighting, the 




(59) 
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Figure 12. The jjoiiits show the square roots of the diagonal components of the covariaiice and cross-covariance, derived from 5 X 10* 
simulations, of the deconvolved PCL estimates, divided by the simple formula of equation (59). The simulations have the same parameters 
as the high resolution map plotted in Figure 7 but include a uncorrelated noise for a Planck-type scanning pattern as shown in Figure 
11 (see text for details). The results are shown for equal weight per pixel and for inverse variance weighting. The lines show the analytic 
model of equations (49) and (52). 



analytic approximation to the covariance matrix Vui is therefore not as accurate at low nmltipoles as it is for equal weights. 
The full matrices are compared with the analytic formulcie in Figure 13. As expected, the analytic formulae axe in excellent 
agreement with the results from the simulations at high multipoles. However, there are large discrepancies at low multipoles 
in the case of inverse variance weighting that arc clearly visible in Figures 13c and 13d. As described in Section 2.2, the 
covariance matrices (and cross-covariances) at low multipoles can be evaluated accurately by summing over pixels using 
degraded resolution maps (c/ equation 18), but as we will see in the next example, there is no need to compute these 
components for the hybrid estimator. 

As a final illustration of the hybrid estimator, we have analysed a simulation with 0.1° pixels and a beam smoothing 
of Os = 0.2°. The simulated map contains 4.12 x 10® pixels and is therefore comparable in size and resolution to the CMB 
maps produced by WMAP. A Planck- like scanning strategy, as shown in Figure 11, was adopted and the noise level was set 
to CTj, = 2 X 10~^ for pixels with the mean hit count. At this resolution, the cuspiness in the hit count distribution causes 
numerical problems in evaluating the PCL esimator at high multipoles if exact inverse variance weighting is used. These can 
be avoided by introducing a regularising parameter e/, 

in the weight function (c/the heuristic weight function of equation (58)). The analytic dispersions for the deconvolved PCL 
estimates (divided by the approximation of equation 59) are shown in Figure 14a for four weighting schemes. The errors for 
equal weighting grow steadily at I ^ 500, as noise becomes a more important contribution to the errors (the structure in this 
curve &.t t ^ 500, i.e. the changes in slope and the plateaus, are related to the acoustic peak structure in the CMB spectrum). 
The weighting scheme of equation (60) with e/ =0.1 leads to an improvement in the errors at ^ ^ 800, but performs poorly 
at lower multipoles. The weighting with c/ = 0.5 performs almost as well as the e/ =0.1 weighting in the noise dominated 
limit, but performs much better in the intermediate region 500 ^ I ^ 750. We therefore use the equal weights and e/ = 0.5 
PCL estimates in the hybrid estimator. 

The inputs into the hybrid estimator are as follows: 

(i) QML estimates and Fisher matrix for ^ < 40 computed from a degraded resolution map (pixel size of = 3° , smoothing 

= 3°). 

(ii) PCL estimate with equal weight per pixel over the range 2 < £ < 1000 with analytic covariance matrix (equation 49) 
for multipoles t > 30. At lower multipoles, the covariance matrix was computed by summing over pixels (equation (18)) in a 
lower resolution map {dc = 1-5°, de = 1.5°). 
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(a) simulations equal weight (b) analytic equal weight 




50 100 150 50 100 150 





Figure 13. The analytic covariance matrices for the deconvolved PCL estimates compared with the results from 5 X 10"* simulations 
including noise. The upper panel (figures 13a and 13b) shows the covariance matrices for equal weight per pixel. The middle panel 
(figures 13c and 13d) shows the covariance matrices for inverse variance weighting. The lower panel (figures 13e and 13f) shows the 
cross-covariance between the estimates for these two weighting schemes. 



^1 
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(b) £,=0.1 

(a) comparison of weights 




Figure 14. Figure 14a shows the analytic dispersions, normahzcd by equation (59), for various weighting scliernes. Tliesc have been 
computed for a map witli 0.1° pixels, a smoothing of 9s = 0.2°, and a Planck-type noise pattern as shown in Figure 11. The noise level 
was set to (jp = 2 X lO^'"* for a pixel with the average hit count. The parameter ej is the regularizing parameter in the weighting scheme 
of equation (60). The dashed line in Figure 14a was computed from the diagonal components of the covariance matrix for the hybrid 
estimator (equation 45) applied to a data vector consisting of QML estimates over the range 2 < £ < 40, PCL estimates with equal 
weight per pixel over the range 2 < £ < 1000, and PCL estimates with €f = 0.5 weighting over the range 500 < £ < 1000. Figures (b)-(d) 
show the PCL estimates for the same simulation for three diflferent weighting schemes. The solid line shows the power spectrum for this 
particular simulation and the dotted line shows the power spectrum of the fiducial ACDM model. The power spectra in these figures 
have been corrected to zero beam width. Note that the abscissae in all of these plots uses a mixed logarithmic-linear scale in £, so that 
the behaviour of the power spectrum at low multipoles is clearly visible; the scale is logarithmic over the rangle 2 < ^ < 20 and linear 
at higher multipoles. 

(iii) PCL estimates using weights with e/ = 0.5 (equation 60) over the range 500 < £ < 1000 together with the analytic 
covariance matrix. 

(iv) The cross-covariances between equal weight PCL estimates and QML estimates (equation 41). 

(v) The cross-covariances between equal weight and e/ = 0.5 PCL estimates, computed from equation (49) with elements at 
(£,£') < 100 set to zero to avoid any spurious coupUng from inaccuracies in the analytic expression at low multipoles. 

The hybrid estimates are plotted in Figure 15 using the same logarithmic-linear abscissa as in Figure 14. At low multipoles, 

the power spectrum estimates reproduce almost exactly the input power spectrum for this simulation (solid line) as in the 
noise-free example shown in Figure 9. The estimates then match smoothly to the equal weight PCL estimates (Figure 14d) at 
£ > 40, and then match smoothly on to the £/ = 0.5 PCL estimates (Figure 14c) at £ ^ 700. The dispersions of these estimates 
computed from equation (45) are plotted as the dashed line in Figure 14b. This figure shows that the hybrid estimates are 
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hybrid 




2 5 10 20 250 500 750 1000 

I 

Figure 15. The filled circles show the hybrid estimate for the high resolution simulation, including noise, used for the PCL estimates 
in Figure 14. As in Figure 14 the dotted line shows the fiducial ACDM power spectrum and the solid line shows the actual power 
spectrum computed from the a^rn coefficients used to generate the map. The hybrid estimator was applied to a data vector consisting of 
QML estimates over the range 2 < i < 40, PCL estimates with equal weight per pixel over the range 2 < £ < 1000, and PCL estimates 
with e/ = 0.5 weighting over the range 500 < £ < 1000. The hybrid estimator therefore returns almost the exact input power spectrum 
up to ^ = 40, smoothly matches on to the equal variance PCL estimates up to ^ ~ 700 and then smoothly matches to the e/ = 0.5 PCL 
estimates at higher multipoles. 

close to optimal over the full range of multipoles 2 < £ < 1000. There would be some small improvement in extending the 
QML estimator to higher multipoles, since the QML errors have not quite converged to the PCL errors hy £ = 40. It would 
also be possible improve the errors over the multipolo range 500 ^ £ ^ 700 by adding a third PCL estimate with e/ ~ 1 into 
the hybrid estimator, but any improvement will be small. The covariance matrix for the hybrid estimator is almost exactly 
diagonal at low multipoles, as in the example shown in Figure 10, and becomes band diagonal at higher multipoles, as in the 
examples shown in Figures 13a and 13c. 

6 CONCLUSIONS 

As explained in the Introduction, a number of methods of estimating power spectra have boon discussed over the last few 
years. These range from the fgist PCL estimators, first applied to cosmology in the 1960's, to more sophisticated 0{N^) ML 
estimators. The Introduction reviewed these methods and contrasted their strengths and weaknesses. We concluded that for 
the analysis of realistic data, what is required is a fast power spectrum estimator that returns a reliable covariance matrix. 
Such an estimator can then be used as part of a Monte Carlo chain to assess the effects of various complex sources of errors, 
unavoidable in real experiments, on the power spectrum estimates and their covariances. This point of view is consistent with 
the MASTER** method of Hivon et al. (2002) and with the analysis of the WMAP data described by Hinshaw et al. (2003), 
both of which are based on fast PCL estimators. 

In this paper, an unbiased hybrid estimator was developed that combines QML estimates at low multipoles and PCL 
estimates with various weightings at higher multipoles. A number of analytic results wore derived showing the statistical 
equivalence of QCL and PCL estimators in the noise-free and noise-dominated limits. Expressions for the covariance matrices 
and their cross-covariances were derived and tested against numerical simulations. The hybrid estimator was illustrated for a 

** Monte Carlo Apodised Spherical Transform Estimator 
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high resolution CMB experiment {9s = 0.2°) and shown to reproduce the power spectrum with close to optimal errors over 
the full range of multipoles 2 < £ < 1000 for a Planck-type noise pattern. The method is fast (the timing is dominated by the 
0{Nd)^^^ operation count for fast spherical transforms) and returns an accurate and nearly band-diagonal covariancc matrix 
over the full range of multipoles. The hybrid estimator therefore fulfills the criteria discussed in the previous paragraph for a 
fast power spectrum estimator, with a simple covariance matrix, suitable for inclusion in a Monte Carlo chain. 

In the numerical examples dcscri1)c'(l in this paper, the instrumental noise was assumed to be diagonal. This is, of course, 
unrealistic and is the main limitation of the analysis presented in this paper. The instrumental (and other) sources of noise 
will have l//-type noise properties. When coupled with a Planck- type scanning strategy, l//-type instrumental noise will 
lead to correlated noise on the sky (in the form of stripes in the sky maps) . The effects of striping can be reduced during the 
map-making stage, either by applying an optimal maximum likelihood map-making algorithm {e.g. Wright 1996; Dore et al, 
2001b, Natoli et al, 2001) or by applying a simpler but sub-optimal destriping algorithm {e.g. Delabrouille J, 1998; Keihanen 
et at, 2003). The QML estimates can, of course, handle correlated pixel noise described by the general noise matrix Nij of 
equation (53). Thus a simple way of accounting for 1/f detecter noise on the low CMB multipoles is to construct a degraded 
resolution map directly from the TOD, together with the full noise covariance matrix for the degraded resolution maps. It 
is computationally feasible to produce such low resolution maps ( ^ 10'' pixels) and covariancc matrices using maximum 
likelihood map making methods from Planck-sized TODs (Borrill, private communication). The low resolution data, which 
need only be computed once, can then be fed as inputs into the QML estimator as described in Section 5. 

This type of analysis could eliminate the requirement in the MASTER method for the calibration of power spectrum 
transfer functions via simulations (see equation 15 of Hivon et al. 2002). In addition, the covariance matrix of the hybrid 
estimator would reflect the effects of correlated noise at low multipoles accurately, since it uses the QML Fisher matrix as an 
input. However, in this approach, the effects of 1// noise would not bo properly included in the PCL estimates. Fortunately, 
for the parameters of the Planck instruments, the effects of 1/f noise should be confined to low multipoles {£ ^ 100, see 
e.g. Keihanen et al. 2003) and the pixel-pixel noise should be strongly diagonally dominated {e.g. Stompor and White 2003) 
suggesting that PCL estimates and the analytic estimates of their covariance matrix should be accurate. The effects of realistic 
1/f noise on the hybrid estimator clearly need to tested against numerical simulations. 

In the examples shown in this paper, the QML estimator was applied to low resolution maps with a maximum of 3842 
pixels. However, it is possible to analyse larger maps within a Monte Carlo chain. For any particular sky cut and noise 
covariance matrix, the matrices Eij, the Fisher matrix and the cross covariances with PCL estimates need only be computed 
once and stored. The evaluation of the QML estimates for different realisations of the data vector Xi then requires only the 
evaluation of traces (equation 53) , provided that the matrices can be read into the computer memory. The main limitation is 
likely to be set by the size of the computer memory, rather than the cpu time available. This time-saving trick was used to 
evaluate the QML estimates from the large number of simulations used to construct Figure 5. 

Throughout this paper we have assumed spherically symmetric beams and that the beam profiles are known exactly. 
Neither of these assumptions will bo true in practice. Uncertainties in the beam shapes would affect the PCL estimates at 
high multipoles. Such uncertainties can be incorporated in the PCL covariance matrix as described by Hinshaw et al. (2003). 
Dealing with far-field beam asymmetries is more problematic as a calibration of a PCL estimator would require simulations 
of the scanning strategy with a full Att convolution of the beam pattern on the sky (Wandelt and Gorski 2001). For a mission 
such as Planck, the beam asymmetries are small enough that an analysis of the CMB power spectrum in terms of an effective 
spherically symmetric beam should be adequate. As with 1/f noise, the effects of beam asymmetries on the Planck mission 
need to be checked against numerical simulations. 

The principal mythology that this paper is designed to dispel is that a full 0{N^) maximum likelihood solution will 
produce some magical improvement in estimates of a power spectrum. The truth is that for realistic experiments, any benefits 
over the hybrid estimator are likely to be small (c/ Figure 15) and overwhelmed by 'real word' uncertainties such as beam 
calibrations, foreground separation, point source subtraction etc. The hybrid estimator leads to an unbiased, nearly optimal, 
power spectrum estimator with an accurate and almost diagonal covariance matrix. The benefits of being able to assess such 
real world complexities using a hybrid estimator within a Monte Carlo chain, and hence to quantify any corrections to the 
covariance matrix, are likely to more than offset any marginal improvements that might be gained from a full 0{Ndf' ML 
solution. Since the hybrid estimator is fast, and the covariance matrix is nearly diagonal, it is possible to test simple parametric 
scalings of the covariance matrix to produce a good approximation to the of equation (23) for different theoretical jjowor 
spectra. It is also possible to test parameterizations of the shape of the likelihood function, rather than using the model 
of equation (23). (This should not require large numbers of simulations if the covariance matrix of the hybrid estimator is 
diagonally dominant as in the examples discussed in Section 5.) Both of these tests are important for the unbiased estimation 
of cosmological parameters (see e.g. Verde et al. 2003) and will become of still greater importance for the analysis of high 
precision experiments such as Planck. 

The hybrid estimator can be generalised to handle other types of data, for example, galaxy clustering, weak gravitational 
lensing and CMB polarization. The extension to CMB polarization will be discussed in a future paper. 
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